My key advice to you
Communication is key
\(H_0\): The null hypothesis, no effect
\(H_1\): The alternative hypothesis, there is an effect
We run a test, we get a p-value, say \(0.03\). It is a probability.
Probability of what, exactly?
Probability that \(H_0\) is true (probability that there is no difference), given the data
Probability that \(H_1\) is true (probability that there is a difference), given the data
Probability that the data is random
Probability that the observations are due to random chance
Probability of getting the same data by random chance
| Frequentist Statistics | Bayesian Statistics |
|---|---|
| 1. Probability is defined as the long-run frequency of events | 1. Probability represents a degree of belief or certainty about an event |
| 2. Parameters (like the “true value”) are fixed but unknown quantities. | 2. Parameters are treated as random variables with their own probability distributions. |
| 3. Asking about the probability of a hypothesis does not make sense | 3. Asking about the probability of a hypothesis is the main goal |
P-values are part of scientific language
Know their limits:
COVID-19 study, both COVID-19 patients and non-COVID-19 patients are compared in two groups of people, G1 and G2.
We wanted to know whether the influence of COVID-19 is different in these two groups.
Groups G1 and G2 were randomly drawn from the same population. They were not different at all.
The Difference Between “Significant” and “Not Significant” is not Itself Statistically Significant
(Andrew Gelman and Howard Stern)
If a gene is significant in one comparison, and not significant in another, that does not mean that there is a difference between the two groups.
It simply means that we failed to detect the difference in one of the comparisons, but that is actually quite likely to happen!
Therefore:
Don’t say “there is no difference”. Say “we did not detect a difference”.
Attempt to replicate 53 high-impact cancer biology papers:
” Second, none of the 193 experiments were described in sufficient detail in the original paper to enable us to design protocols to repeat the experiments, so we had to seek clarifications from the original authors.” (Errington et al., 2021)
You can find a longer presentation along its source code at https://github.com/bihealth/howtotalk
Parts of this have been expanded to a longer text which can be found at https://bihealth.github.io/howtotalk-book/
A 5 day R crash course book is available at https://bihealth.github.io/RCrashcourse-book/
flowchart LR
A(Program + Text) -->|knitr| B(Text with analysis results)
B --> C[LaTeX]
C --> CC[PDF]
B --> D[Word]
B --> E[HTML]
B --> F[Presentation]
B --> G[Book]
This can be Rmarkdown, Quarto, Jupyter… the goal is that your code and your text are in one place, and the results of your calculations are entered automatically into the text.
In systems such ar R markdown, you can put directly your analysis results in your text. For example, when I write that the \(p\)-value is equal to 0.05, I am writing this:
The \(p\)-value above is not entered manually (as 0.05), but is the result of a statistical computation. If the data changes, if your analysis changes, the \(p\)-value above will automatically change as well.
1, 2, 3, …)S1, then always refer to that sample as S1, not Smp. 1 or 1 or Sample 1RCDB2024_S1, RCDB2024_S2, RCDB2025_S1, RCDB2025_S2WT_treatment_1, KO_control_2 and not WT_treatment_1, KO-2-control, WT_ctrl2Never encode information as formatting, always use explicit columns
Color / font size / font style cannot be read automatically
Make a separate column for comments
Otherwise the values might be lost1
Make a separate excel sheet for column meta information
Subject. One mature Atlantic Salmon (Salmo salar) participated in the fMRI study. The salmon was approximately 18 inches long, weighed 3.8 lbs, and was not alive at the time of scanning.
Task. The task administered to the salmon involved completing an open-ended mentalizing task. The salmon was shown a series of photographs depicting human individuals in social situations with a specified emotional valence. The salmon was asked to determine what emotion the individual in the photo must have been experiencing.
Nieuwenhuis et al. found that half of the scientists who could have commited this error, did in fact commit this error.
Core Unit for Bioinformatics, BIH@Charite